Back

Communications Medicine

63 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
Molecular characterisation of a Klebsiella pneumoniae neonatal sepsis outbreak in a rural Gambian hospital: a retrospective genomic epidemiology investigation
2026-03-04 genetic and genomic medicine 10.64898/2026.03.03.26347025
Top 0.2% (2.8%)
Show abstract

BackgroundKlebsiella pneumoniae is a common cause of neonatal sepsis in Africa, and is frequently hospital acquired. We recently reported an outbreak of multidrug-resistant K. pneumoniae sepsis amongst neonates at a rural hospital in The Gambia, West Africa, involving 57 cases and case fatality of 60%. Here we undertook a retrospective pathogen genomic epidemiology study of clinical and environmental K. pneumoniae isolated during the outbreak, to identify the outbreak strain, refine the epidemic...

2
Shared multicellular injury programs of acute and chronic kidney disease enable mechanistic patient stratification
2026-03-06 nephrology 10.64898/2026.03.05.26347522
Top 0.4% (2.1%)
Show abstract

Acute kidney injury (AKI) and chronic kidney disease (CKD) are two interconnected clinical conditions, both defined by degree of functional impairment, but with heterogeneous clinical trajectories. Using new transcriptomic technologies, recent studies have described the cellular diversity in the healthy and injured kidney at the single cell level. Here, we used single nucleus transcriptomics to investigate the molecular diversity and commonalities in kidney biopsies from over 150 participants wi...

3
Enhancing Prediabetes Diagnosis from Continuous Glucose Monitoring Data via Iterative Label Cleaning and Deep Learning
2026-03-05 health informatics 10.64898/2026.03.04.26347604
Top 0.5% (2.0%)
Show abstract

As of early 2026, over 115 million US adults (more than 1 in 3) have prediabetes, a condition with an annual conversion rate of 5%-10% to type 2 diabetes. Total diabetes (diagnosed and undiagnosed) affects approximately 40.1 million Americans, or 12% of the population, with roughly 1.5 million new cases diagnosed annually. Continuous Glucose Monitoring (CGM) provides real-time, 24/7 insights into glycemic variability, detecting dangerous highs, lows, and trends that HbA1c (a 3-month average) mis...

4
Time of Day as an Unmeasured Confounder in Oncology Trials
2026-03-06 oncology 10.64898/2026.03.05.26347742
Top 0.8% (1.9%)
Show abstract

A recent randomized clinical trial in non-small cell lung cancer1 confirms what numerous observational studies have reported time of day (ToD) may dramatically influence treatment outcomes in cancer patients. In this recent trial median overall survival (OS) decreased from 28 months in the early ToD arm to 16.8 months in the late ToD arm. We raise the concern that clinical trial outcomes may be influenced by seemingly minor biases in treatment time across arms. We also suggest that by measuring ...

5
How Low Could Semaglutide Prices Fall? An Analysis of Production Cost and Implications for Global Access Ahead of Patent Expiry
2026-03-04 endocrinology 10.64898/2026.03.04.26347508
Top 1% (1.5%)
Show abstract

ObjectivesTo estimate potential launch prices of generic semaglutide following patent expiry from 2026 and to quantify the global obesity and type 2 diabetes (T2DM) burden in countries where generic access may become possible. MethodsWe used World Bank population data and World Obesity and Diabetes Atlas prevalence estimates to calculate obesity and T2DM burden in countries where semaglutide patents expire in 2026 or were not filed. Patent status was identified using MedsPaL and cross-checked w...

6
Utility of glucose, lipid and kidney function Trajectory Measures for incident Cardiovascular Disease risk prediction for people living with Type 2 Diabetes: a case-study using Danish registry data
2026-03-06 cardiovascular medicine 10.64898/2026.03.06.26347493
Top 2% (1.5%)
Show abstract

Abstract Introduction Cardiovascular disease (CVD) is an important complication of type 2 diabetes (T2D). Current incident CVD-prediction models use single baseline measurements and achieve moderate performance in people with T2D, with C-indices around 0.7. Modern healthcare registries contain repeated measurements of HbA1c, LDL-cholesterol and eGFR, which could carry incremental predictive value. However, the added value of trajectory measures for CVD-risk prediction remains unclear. We aimed t...

7
Two-step deep-learning candidemia prediction model using two large time-sequence electronic health datasets
2026-03-04 infectious diseases 10.64898/2026.03.03.26347531
Top 3% (1.3%)
Show abstract

BackgroundCandidemia is a rare but life-threatening bloodstream infection that remains difficult to predict using conventional risk stratification approaches, highlighting the need for improved predictive strategies. As a result, empiric antifungal therapy is often delayed even in high-risk patients. MethodsWe developed a deep learning model (PyTorch_EHR) to predict 7-day candidemia risk by using electronic health record data from two large cohorts (Houston Methodist Hospital System [HMHS] and ...

8
Gene to Morphology Alignment via Graph Constrained Latent Modeling for Molecular Subtype Prediction from Histopathology in Pancreatic Cancer
2026-03-06 oncology 10.64898/2026.03.05.26347711
Top 3% (1.2%)
Show abstract

Molecular subtyping of cancer is traditionally defined in transcriptomic space, yet routine clinical deployment is limited by the availability and cost of sequencing. Meanwhile, histopathology captures rich morphological information that is known to correlate with molecular state but lacks a principled, mechanistic bridge to gene-level representations. We propose a graph-constrained learning framework that aligns morphology-derived signals with a fixed, data-driven gene network discovered via hi...

9
Class imbalance correction in artificial intelligence models leads to miscalibrated clinical predictions: a real-world evaluation
2026-03-05 health informatics 10.64898/2026.03.04.26347634
Top 3% (1.1%)
Show abstract

BackgroundPredictive models employing machine learning algorithms are increasingly being used in clinical decision making, and improperly calibrated models can result in systematic harm. We sought to investigate the impact of class imbalance correction, a commonly applied preprocessing step in machine learning model development, on calibration and modelled clinical decision making in a large real-world context. MethodsA histogram boosted gradient classifier was trained on a highly imbalanced na...

10
OncoRAG: Graph-Based Retrieval Enabling Clinical Phenotyping from Oncology Notes Using Local Mid-Size Language Models
2026-03-06 oncology 10.64898/2026.03.05.26347717
Top 3% (1.1%)
Show abstract

Introduction: Manual data extraction from unstructured clinical notes is labor-intensive and impractical for large-scale clinical and research operations. Existing automated approaches typically require large language models, dedicated computational infrastructure, and/or task-specific fine-tuning that depends on curated data. The objective of this study is to enable accurate extraction with smaller locally deployed models using a disease-site specific pipeline and prompt configuration that are ...

11
Outburst of serotype 4 IPD after COVID-19 is driven by ST15063/GPSC162 lineage associated with high-risk behaviors and greater virulence linked to influenza H3N2 virus coinfection and cigarette smoke
2026-03-04 infectious diseases 10.64898/2026.02.27.26346872
Top 4% (1.0%)
Show abstract

The emergence of vaccine covered serotypes causing invasive pneumococcal disease (IPD) is a serious concern worldwide. We investigated the unexpected rise of serotype 4 causing IPD primarily in non-vaccinated young adults after the COVID-19 pandemic that further spread to adults [≥] 65 years in recent years. For this purpose, we conducted a retrospective study of serotype 4 IPD cases (n=827) reported in Spain between 2009 and 2024. Whole-genome sequencing was performed to assess clonal lineag...

12
Performance of an Optimized Methylation-Protein Multi-Cancer Early Detection (MCED) Test Classifier
2026-03-04 oncology 10.64898/2026.03.03.26347329
Top 4% (1.0%)
Show abstract

Multi-cancer early detection (MCED) tests can detect several cancer types and stages. We previously developed a methylation and protein (MP V1) MCED classifier. In this study, we present a refined MP V2 classifier, developed by evaluating model architectures that improved performance in prospectively enrolled case-control cohorts under standard testing conditions. The newly developed MP V2 classifier was trained to be more generalizable and achieve increased early-stage sensitivity at a target s...

13
Genome-Wide Association Study of Creatinine Clearance Identifies New Loci for Kidney Function
2026-03-05 nephrology 10.64898/2026.03.04.26347652
Top 5% (1.0%)
Show abstract

IntroductionGenome-wide association studies (GWAS) for kidney function have mainly focused on creatinine-based glomerular filtration rate (eGFRcrea), which is affected by variation in muscle mass. Moreover, the genetic basis of the sexual dimorphism of chronic kidney disease is underexplored. MethodsWe performed a GWA meta-analysis for creatinine clearance (CrCl), a muscle mass-independent kidney function phenotype, in 58,976 individuals of European descent from the Lifelines Cohort Study. Res...

14
A bootstrap particle filter for viral Rt inference and forecasting using wastewater data
2026-03-06 epidemiology 10.64898/2026.03.06.26347747
Top 5% (0.9%)
Show abstract

Wastewater is increasingly being recognized as an important data stream that can contribute to infectious disease surveillance and forecasting. With this recognition, a growing number of statistical inference approaches are being developed to use wastewater data to provide quantitative insights into epidemiological dynamics. However, few existing approaches have allowed for systematic integration of data streams for inference, for example by combining case incidence data and/or serological data ...

15
Analysis Of Clinicopathological Histomorphological And Molecular Differences In Right And Left Sided Colonic Carcinoma
2026-03-04 health systems and quality improvement 10.64898/2026.03.03.26347325
Top 5% (0.9%)
Show abstract

BackgroundColorectal carcinoma (CRC) remains a significant cause of cancer morbidity and mortality worldwide. Right- and left-sided tumours differ in clinical, morphological, and molecular features. Microsatellite instability-high (MSI-H) tumours, often right-sided, are associated with distinct histopathological characteristics and prognostic implications. In Sri Lanka, molecular MSI testing is currently unavailable, highlighting the need for alternative predictive approaches. ObjectivesGeneral...

16
Mapping the Antimicrobial Susceptibility of Methicillin-Resistant Staphylococcus aureus in Western Ethiopia: A multicenter cross-sectional study
2026-03-06 infectious diseases 10.64898/2026.03.05.26347706
Top 6% (0.9%)
Show abstract

From 2021 to 2025, MRSA emerged as a major multidrug-resistant pathogen in the study area. Among 545 S. aureus isolates, 67.2% were MRSA, disproportionately affecting children under five (26.5%) and males (55.5%). Case incidence more than doubled by 2025, suggesting rising transmission or resistance. Most isolates were hospital-associated (85.2%), predominantly from outpatients (88.5%), with middle ear discharge as the main source (67%). Gentamicin showed the highest susceptibility (72.1%), whil...

17
Show Your Work: Verbatim Evidence Requirements and Automated Assessment for Large Language Models in Biomedical Text Processing
2026-03-04 health informatics 10.64898/2026.03.03.26346690
Top 6% (0.8%)
Show abstract

PurposeLarge language models (LLMs) are used for biomedical text processing, but individual decisions are often hard to audit. We evaluated whether enforcing a mechanically checkable "show your work" quote affects accuracy, stability, and verifiability for trial eligibility-scope classification from abstracts. MethodsWe used 200 oncology randomized controlled trials (2005 - 2023) and provided models with only the title and abstract. Trials were labeled with whether they allowed for the inclusio...

18
A spatial multi-omic portrait of survival outcome for clear cell renal cell carcinoma
2026-03-04 oncology 10.64898/2026.03.02.26347390
Top 7% (0.7%)
Show abstract

Clear cell renal cell carcinoma (ccRCC) is the leading cause of kidney cancer-related death, but how the tumor microenvironment shapes patient survival is not completely understood. Here, we describe the characterization of ccRCC tumor ecosystems from 498 patients using imaging mass cytometry with a focus on tumor, myeloid, and T cell landscapes. Data from more than 3 million single cells is analyzed using machine-learning to identify key ecosystem features that outperform basic clinical data fo...

19
The impact of patient ethnicity on cancer incidence following platelet count and C-reactive protein tests in English primary care: a cohort study of 5 million patients
2026-03-04 primary care research 10.64898/2026.03.03.26347503
Top 7% (0.7%)
Show abstract

BackgroundPlatelet count and C-reactive protein (CRP) are blood tests commonly used in primary care as part of diagnostic work up for symptomatic patients. Abnormal results of these tests can indicate an undetected cancer; however, it is not known whether the association between an abnormal test result and cancer risk varies by patient ethnicity. MethodsThis cohort study used routinely collected primary and secondary health care records in England with linkage to national cancer registry data. ...

20
Trustworthy personalized treatment selection: causal effect-trees and calibration in perioperative medicine
2026-03-04 health informatics 10.64898/2026.03.03.26347440
Top 7% (0.7%)
Show abstract

BackgroundPersonalized medicine promises to tailor treatments to the individual, but it carries a hidden risk: mistaking statistical noise for actionable clinical insight. Current machine learning approaches often provide predictions, but fail to inform clinicians when those predictions are unreliable. ObjectiveDevelop a deployment-readiness framework that integrates causal inference, interpretable effect-trees, and calibration assessment to distinguish actionable signal from unreliable variati...